了解多媒体内容中描述或显示的事件彼此相关是开发可用于真实世界媒体的强大人工智能系统的关键组成部分。尽管许多研究专门用于文本,图像和视频域中的事件理解,但没有一个研究探索事件跨域中经历的复杂关系。例如,新闻文章可能会描述“抗议”事件,而视频显示“逮捕”事件。认识到视觉“逮捕”事件是更广泛的“抗议”事件的一个子事件,这是一个具有挑战性但重要的问题,但前面的工作尚未探讨。在本文中,我们提出了多模式事件关系关系的新任务,以识别这种跨模式事件关系。我们贡献了一个大规模数据集,该数据集由100K视频新文章对组成,以及密集注释的数据的基准。我们还提出了一种弱监督的多模式方法,该方法将来自外部知识库(KB)的常识性知识整合在一起,以预测丰富的多模式事件层次结构。实验表明,我们的模型在我们提出的基准上优于许多竞争基线。我们还对模型的性能进行了详细的分析,并建议未来研究的方向。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
我们展示了各种功能和类,可以通过改进机器学习协助的参数空间进行采样过程。特别注意设置理智默认值的目标是,不同问题要求的调整仍然很小。从查找参数空间的界限到在感兴趣的领域中积累样本的界限,可以使用此例程集来进行不同类型的分析。特别是,我们讨论了通过合并不同的机器学习模型来帮助的两种方法:回归和分类。我们表明,机器学习分类器可以为探索参数空间提供更高的效率。此外,我们引入了一种提升技术,以改善过程开始时的缓慢收敛性。在一些示例的帮助下,更好地解释了这些例程的使用,这些示例说明了人们可以获得的结果类型。我们还包括用于获取示例的代码的示例,以及可以对调整计算适应其他问题的调整的描述。我们通过在探索与测得的HigGS玻色子信号强度匹配的两个HIGGS DoubleT模型的参数空间时显示这些技术的影响来最终确定。本文使用的代码和有关如何使用它的说明可在网络上可用。
translated by 谷歌翻译
物联网(物联网)正在通过弥合信息技术(IT)和运营技术(OT)之间的差距来改变行业。机器正在与连接的传感器集成在一起,并通过智能分析应用程序管理,加速了数字化转型和业务运营。将机器学习(ML)带到工业设备是一个进步,旨在促进IT和OT的融合。但是,在工业物联网(IIOT)中开发ML应用程序提出了各种挑战,包括硬件异质性,ML模型的非标准化表示,设备和ML模型兼容性问题以及慢速应用程序开发。在这一领域的成功部署需要深入了解硬件,算法,软件工具和应用程序。因此,本文介绍了一个名为ML应用程序的名为“语义低代码工程”(SELOC-ML),该框架建立在低代码平台上,以利用语义Web技术来支持IIOT的ML应用程序的快速开发。 SELOC-ML使非专家能够轻松地模拟,发现,重复使用和对接ML模型和设备。可以根据匹配结果自动生成项目代码在硬件上部署。开发人员可以从称为食谱的语义应用模板中受益,从而快速原型最终用户应用程序。与工业ML分类案例研究中的传统方法相比,评估证实了至少三倍的工程努力,显示了SELOC-ML的效率和实用性。我们分享代码并欢迎任何贡献。
translated by 谷歌翻译
Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
The performance of inertial navigation systems is largely dependent on the stable flow of external measurements and information to guarantee continuous filter updates and bind the inertial solution drift. Platforms in different operational environments may be prevented at some point from receiving external measurements, thus exposing their navigation solution to drift. Over the years, a wide variety of works have been proposed to overcome this shortcoming, by exploiting knowledge of the system current conditions and turning it into an applicable source of information to update the navigation filter. This paper aims to provide an extensive survey of information aided navigation, broadly classified into direct, indirect, and model aiding. Each approach is described by the notable works that implemented its concept, use cases, relevant state updates, and their corresponding measurement models. By matching the appropriate constraint to a given scenario, one will be able to improve the navigation solution accuracy, compensate for the lost information, and uncover certain internal states, that would otherwise remain unobservable.
translated by 谷歌翻译